-
Notifications
You must be signed in to change notification settings - Fork 65
feat: Adopt resumption feature of online data mixing #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adopt resumption feature of online data mixing #617
Conversation
Signed-off-by: Mehant Kammakomati <[email protected]>
|
Thanks for making a pull request! 😃 |
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
|
|
||
| resume_from_checkpoint = None | ||
| if train_args.output_dir: | ||
| os.makedirs(train_args.output_dir, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make the output directory doesn't the trainer make it already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_last_checkpoint has a dependency of having output_dir available, it was just the code that already exists I move it above in this function :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to this failing test - https://github.com/foundation-model-stack/fms-hf-tuning/actions/runs/18341924401
Signed-off-by: Mehant Kammakomati <[email protected]>
7df2365 to
82495d3
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Changes
In this PR, we move construction of the
resume_from_checkpointto the top of the function so that it can used in odm_config. Further depends on merge of foundation-model-stack/fms-acceleration#155